Lab 3: Principal Components Analysis (PCA) Stat 154, Spring 2018 Introduction The goal of this lab is to go over the various options and steps required to perform a Principal Components Analysis (PCA). You will also learn about the functions prcomp() and princomp(), and how to use their outputs to answer questions like: • How many principal components to retain. • How to visualize the observations. • How to visualize the relationships among variables. • How to visualize supplementary variables.
Dataset NBA Teams In this lab we are going to use the data set about NBA teams, containing statistics per game during the regular season 2016-2017. The corresponding CSV file is available in the data/ folder of the github repo: https://github.com/ucb-stat154/stat154-spring-2018/tree/master/data
Your turn Create a new data frame dat that contains the following columns: • wins • losses • points • field_goals • points3 • free_throws • off_rebounds • def_rebounds • assists • steals • blocks • personal_fouls
repo<-'https://github.com/ucb-stat154/stat154-spring-2018/'
csv_file<-'raw/master/data/nba-teams-2017.csv'
url<-paste0(repo,csv_file)
download.file(url,destfile='nba-teams-2017.csv')
dataset<-read.csv('nba-teams-2017.csv',stringsAsFactors = FALSE)
str(dataset, vec.len = 1)
## 'data.frame': 30 obs. of 27 variables:
## $ team : chr "Golden State Warriors" ...
## $ games_played : int 82 82 ...
## $ wins : int 67 61 ...
## $ losses : int 15 21 ...
## $ win_prop : num 0.817 0.744 ...
## $ minutes : num 48.2 48.3 ...
## $ points : num 116 ...
## $ field_goals : num 43.1 39.3 ...
## $ field_goals_attempted: num 87.1 83.7 ...
## $ field_goals_prop : num 49.5 46.9 ...
## $ points3 : num 12 9.2 ...
## $ points3_attempted : num 31.2 23.5 ...
## $ points3_prop : num 38.3 39.1 ...
## $ free_throws : num 17.8 17.6 ...
## $ free_throws_att : num 22.6 22 ...
## $ free_throws_prop : num 78.8 79.7 ...
## $ off_rebounds : num 9.4 10 ...
## $ def_rebounds : num 35 33.9 ...
## $ rebounds : num 44.4 43.9 ...
## $ assists : num 30.4 23.8 ...
## $ turnovers : num 14.8 13.4 ...
## $ steals : num 9.6 8 ...
## $ blocks : num 6.8 5.9 ...
## $ block_fga : num 3.8 4.1 ...
## $ personal_fouls : num 19.3 18.3 ...
## $ personal_fouls_drawn : num 19.4 19.8 ...
## $ plus_minus : num 11.6 7.2 ...
dat<-subset(dataset,select=c('wins','losses','points','field_goals','points3','free_throws','off_rebounds','def_rebounds','assists','steals','blocks','personal_fouls'))
print(dat)
## wins losses points field_goals points3 free_throws off_rebounds
## 1 67 15 115.9 43.1 12.0 17.8 9.4
## 2 61 21 105.3 39.3 9.2 17.6 10.0
## 3 55 27 115.3 40.3 14.4 20.3 10.9
## 4 53 29 108.0 38.6 12.0 18.7 9.1
## 5 51 31 100.7 37.0 9.6 17.1 9.4
## 6 51 31 106.9 39.2 8.8 19.7 10.6
## 7 51 31 110.3 39.9 13.0 17.5 9.3
## 8 51 31 108.7 39.5 10.3 19.3 9.0
## 9 49 33 109.2 41.3 9.2 17.3 10.3
## 10 47 35 106.6 39.5 8.4 19.2 12.2
## 11 43 39 100.5 36.4 9.4 18.3 10.8
## 12 43 39 103.2 38.1 8.9 18.1 10.3
## 13 42 40 105.1 39.3 8.6 17.9 9.0
## 14 42 40 103.6 38.8 8.8 17.2 8.8
## 15 41 41 102.9 38.6 7.6 18.0 12.2
## 16 41 41 107.9 39.5 10.4 18.5 10.1
## 17 41 41 103.2 39.0 9.9 15.2 10.6
## 18 40 42 111.7 41.2 10.6 18.7 11.8
## 19 37 45 101.3 39.9 7.7 13.9 11.1
## 20 36 46 104.9 37.7 10.0 19.4 8.8
## 21 34 48 104.3 39.1 9.4 16.7 8.6
## 22 33 49 97.9 36.2 10.7 14.8 7.9
## 23 32 50 102.8 37.9 9.0 18.1 8.7
## 24 31 51 105.6 39.5 7.3 19.3 11.4
## 25 31 51 104.3 39.6 8.6 16.6 12.0
## 26 29 53 101.1 38.3 8.5 16.0 9.8
## 27 28 54 102.4 37.7 10.1 17.0 9.8
## 28 26 56 104.6 39.3 8.9 17.0 11.4
## 29 24 58 107.7 39.9 7.5 20.4 11.9
## 30 20 62 105.8 37.8 10.7 19.4 8.8
## def_rebounds assists steals blocks personal_fouls
## 1 35.0 30.4 9.6 6.8 19.3
## 2 33.9 23.8 8.0 5.9 18.3
## 3 33.5 25.2 8.2 4.3 19.9
## 4 32.9 25.2 7.5 4.1 20.6
## 5 33.8 20.1 6.7 5.0 18.8
## 6 32.6 18.5 8.3 4.9 20.8
## 7 34.4 22.7 6.6 4.0 18.1
## 8 34.0 22.5 7.5 4.2 19.8
## 9 32.6 23.9 8.5 4.1 21.3
## 10 34.4 21.0 7.9 5.0 20.9
## 11 32.0 21.3 8.0 4.2 22.4
## 12 34.1 23.6 8.2 4.8 18.2
## 13 33.0 22.5 8.2 5.0 19.5
## 14 31.6 24.2 8.1 5.3 20.2
## 15 34.1 22.6 7.8 4.8 17.7
## 16 33.5 21.1 7.0 5.0 21.2
## 17 33.0 21.2 7.2 5.7 20.5
## 18 34.6 25.3 6.9 3.9 19.1
## 19 34.6 21.1 7.0 3.8 17.9
## 20 34.8 23.1 7.0 4.8 16.6
## 21 35.1 22.8 7.8 5.5 18.2
## 22 30.7 20.8 7.5 3.7 19.1
## 23 32.3 22.5 7.6 4.0 20.3
## 24 31.0 23.7 8.0 4.5 20.1
## 25 33.2 21.8 7.1 5.5 20.3
## 26 33.3 22.2 7.1 4.8 19.3
## 27 33.0 23.8 8.4 5.1 21.9
## 28 32.1 20.9 8.2 3.9 20.7
## 29 33.1 19.6 8.2 4.9 24.8
## 30 35.1 21.4 7.2 4.7 21.0
Spend some time examining things like: • descriptive statistics with summary().
summary(dat)
## wins losses points field_goals
## Min. :20.00 Min. :15.00 Min. : 97.9 Min. :36.20
## 1st Qu.:32.25 1st Qu.:31.50 1st Qu.:103.0 1st Qu.:38.15
## Median :41.00 Median :41.00 Median :105.0 Median :39.25
## Mean :41.00 Mean :41.00 Mean :105.6 Mean :39.05
## 3rd Qu.:50.50 3rd Qu.:49.75 3rd Qu.:107.8 3rd Qu.:39.58
## Max. :67.00 Max. :62.00 Max. :115.9 Max. :43.10
## points3 free_throws off_rebounds def_rebounds
## Min. : 7.30 Min. :13.90 Min. : 7.900 Min. :30.70
## 1st Qu.: 8.65 1st Qu.:17.02 1st Qu.: 9.025 1st Qu.:32.67
## Median : 9.30 Median :17.95 Median :10.050 Median :33.40
## Mean : 9.65 Mean :17.83 Mean :10.133 Mean :33.38
## 3rd Qu.:10.38 3rd Qu.:19.07 3rd Qu.:11.050 3rd Qu.:34.33
## Max. :14.40 Max. :20.40 Max. :12.200 Max. :35.10
## assists steals blocks personal_fouls
## Min. :18.50 Min. :6.600 Min. :3.700 Min. :16.60
## 1st Qu.:21.12 1st Qu.:7.125 1st Qu.:4.125 1st Qu.:18.88
## Median :22.50 Median :7.800 Median :4.800 Median :20.00
## Mean :22.63 Mean :7.710 Mean :4.740 Mean :19.89
## 3rd Qu.:23.77 3rd Qu.:8.200 3rd Qu.:5.000 3rd Qu.:20.77
## Max. :30.40 Max. :9.600 Max. :6.800 Max. :24.80
• univariate plots: boxplots, histograms, density curves.
boxplot(dat)
hist(dat$wins)
lines(density(dat$wins))
hist(dat$losses)
lines(density(dat$losses))
hist(dat$points)
lines(density(dat$points))
hist(dat$field_goals)
lines(density(dat$field_goals))
hist(dat$points3)
lines(density(dat$points3))
hist(dat$free_throws)
lines(density(dat$free_throws))
hist(dat$off_rebounds)
lines(density(dat$off_rebounds))
hist(dat$def_rebounds)
lines(density(dat$def_rebounds))
hist(dat$assists)
lines(density(dat$assists))
hist(dat$steals)
lines(density(dat$steals))
hist(dat$blocks)
lines(density(dat$blocks))
hist(dat$personal_fouls)
lines(density(dat$personal_fouls))
• compute the correlation matrix.
print(cor(dat))
## wins losses points field_goals points3
## wins 1.00000000 -1.00000000 0.50752590 0.40747195 0.44681094
## losses -1.00000000 1.00000000 -0.50752590 -0.40747195 -0.44681094
## points 0.50752590 -0.50752590 1.00000000 0.81520604 0.57537543
## field_goals 0.40747195 -0.40747195 0.81520604 1.00000000 0.16940382
## points3 0.44681094 -0.44681094 0.57537543 0.16940382 1.00000000
## free_throws 0.13913686 -0.13913686 0.55875989 0.16018996 0.17182294
## off_rebounds -0.08560734 0.08560734 0.14922012 0.33705067 -0.37624968
## def_rebounds 0.21692340 -0.21692340 0.36992824 0.33683802 0.22765772
## assists 0.46268513 -0.46268513 0.57734309 0.52991280 0.46015123
## steals 0.27032897 -0.27032897 0.30982800 0.35777164 -0.05430218
## blocks 0.28131403 -0.28131403 0.15342092 0.27055358 -0.06847851
## personal_fouls -0.27270936 0.27270936 0.08680588 0.02144334 -0.12042193
## free_throws off_rebounds def_rebounds assists
## wins 0.13913686 -0.08560734 0.21692340 0.46268513
## losses -0.13913686 0.08560734 -0.21692340 -0.46268513
## points 0.55875989 0.14922012 0.36992824 0.57734309
## field_goals 0.16018996 0.33705067 0.33683802 0.52991280
## points3 0.17182294 -0.37624968 0.22765772 0.46015123
## free_throws 1.00000000 0.17192870 0.12770466 0.08978386
## off_rebounds 0.17192870 1.00000000 0.02385551 -0.16448415
## def_rebounds 0.12770466 0.02385551 1.00000000 0.23022655
## assists 0.08978386 -0.16448415 0.23022655 1.00000000
## steals 0.23062295 0.07718630 -0.19672636 0.42982794
## blocks -0.00406670 -0.02172908 0.30223031 0.30242149
## personal_fouls 0.31602643 0.27205446 -0.41289270 -0.26139863
## steals blocks personal_fouls
## wins 0.27032897 0.28131403 -0.27270936
## losses -0.27032897 -0.28131403 0.27270936
## points 0.30982800 0.15342092 0.08680588
## field_goals 0.35777164 0.27055358 0.02144334
## points3 -0.05430218 -0.06847851 -0.12042193
## free_throws 0.23062295 -0.00406670 0.31602643
## off_rebounds 0.07718630 -0.02172908 0.27205446
## def_rebounds -0.19672636 0.30223031 -0.41289270
## assists 0.42982794 0.30242149 -0.26139863
## steals 1.00000000 0.37392387 0.31375577
## blocks 0.37392387 1.00000000 -0.02772306
## personal_fouls 0.31375577 -0.02772306 1.00000000
• get a scatterplot matrix with pairs()
pairs(dat)
Your turn As we saw in lecture, the minimal output of a PCA procedure should consists of eigenvalues, loadings, and principal components: Create the following objects: • eigenvalues: vector of eigenvalues (i.e. λ1, λ2, . . .)
pca_prcomp <- prcomp(dat, scale. = TRUE)
eigenvalues<-pca_prcomp$sdev^2
print(eigenvalues)
## [1] 4.164615e+00 2.061621e+00 1.377660e+00 1.339038e+00 9.445148e-01
## [6] 8.266370e-01 5.418803e-01 3.172210e-01 2.634804e-01 1.632044e-01
## [11] 1.273592e-04 6.167771e-33
• loadings: matrix of eigenvectors (i.e. V)
V<-pca_prcomp$rotation
print(V)
## PC1 PC2 PC3 PC4
## wins -0.398075412 0.15598612 0.10745878 -0.15334568
## losses 0.398075412 -0.15598612 -0.10745878 0.15334568
## points -0.419824132 -0.19938732 -0.30908139 0.07930596
## field_goals -0.357358372 -0.23821878 0.04168063 0.29261717
## points3 -0.282642201 0.24236778 -0.43852168 -0.26438584
## free_throws -0.167115047 -0.34510341 -0.42618447 -0.04954457
## off_rebounds 0.009704962 -0.44232712 0.01245106 0.45567381
## def_rebounds -0.212971258 0.19999106 -0.10297431 0.60152383
## assists -0.369015007 0.06739504 0.12123879 -0.10249238
## steals -0.207867923 -0.36794870 0.39186478 -0.33667376
## blocks -0.196357624 -0.05130561 0.55914063 0.13587072
## personal_fouls 0.088950346 -0.54661171 -0.11852570 -0.27733937
## PC5 PC6 PC7 PC8
## wins 0.468414493 -0.18979579 0.04365872 -0.04471559
## losses -0.468414493 0.18979579 -0.04365872 0.04471559
## points -0.095512938 0.11327669 0.07578747 -0.15149979
## field_goals 0.006748312 0.40970709 0.16943791 -0.45161451
## points3 -0.163151046 0.07974936 0.37045051 0.38506660
## free_throws -0.097697769 -0.52878918 -0.48681293 0.03206656
## off_rebounds 0.442773172 0.13341250 -0.01438259 0.59025255
## def_rebounds -0.266371336 -0.27622434 0.02534623 -0.12997347
## assists -0.330000660 0.40087945 -0.29590090 0.41909728
## steals -0.149511341 0.05253747 -0.29905033 -0.12759984
## blocks -0.334421944 -0.43700019 0.32729470 0.24784686
## personal_fouls -0.075831806 -0.11293000 0.55004993 -0.03383988
## PC9 PC10 PC11 PC12
## wins 0.03092493 0.147801412 -0.0005032429 7.071068e-01
## losses -0.03092493 -0.147801412 0.0005032429 7.071068e-01
## points -0.17172943 -0.199970573 -0.7496947197 -2.747802e-15
## field_goals -0.23247393 -0.069322551 0.5184251837 1.776357e-15
## points3 0.20677816 -0.388229297 0.2953012488 1.054712e-15
## free_throws -0.24426074 0.008749499 0.2863049501 9.853229e-16
## off_rebounds 0.13875304 -0.121463456 -0.0010947897 2.081668e-17
## def_rebounds 0.56616194 0.238503116 -0.0001195162 1.249001e-16
## assists -0.10854694 0.538040594 -0.0027449836 3.053113e-16
## steals 0.55994315 -0.331865117 -0.0001105887 -5.551115e-17
## blocks -0.34632118 -0.190812412 -0.0027385340 0.000000e+00
## personal_fouls 0.16456848 0.503039241 -0.0017430072 5.551115e-17
• scores: matrix of principal components (i.e. Z = XV)
Z<-pca_prcomp$x
print(Z)
## PC1 PC2 PC3 PC4 PC5 PC6
## 1 -7.11481922 -0.32475222 2.2905098 -0.34137621 -1.439762641 0.55967095
## 2 -2.14360862 0.97292384 1.8195990 0.08591542 0.862169896 -1.06432907
## 3 -3.86843431 -0.54512433 -2.3009107 -0.90366254 0.280477825 0.37529668
## 4 -1.55766480 0.74905508 -1.3461868 -1.66593211 0.325782424 0.14491866
## 5 0.91352401 2.18374798 0.2117517 0.05745074 1.166635570 -1.61695575
## 6 -0.28521066 -1.42113610 0.0763837 -0.59755356 1.482174192 -1.56463965
## 7 -1.69736079 2.23488894 -2.1612456 0.39672861 0.558873489 0.48670516
## 8 -1.30123596 0.52231238 -1.2228823 -0.35011719 0.387159395 -0.56072762
## 9 -1.43097081 -1.29564930 0.2210834 -0.75781908 0.778681084 1.42448842
## 10 -0.54648178 -1.52688188 0.1046333 1.25214874 1.062591040 -1.09772919
## 11 1.70208669 -0.87147140 -0.0982859 -1.74276279 1.089643984 -0.80186740
## 12 -0.11434088 0.53813869 0.7736702 0.25176282 -0.032864729 -0.39038642
## 13 -0.07617772 0.01529814 0.8697323 -0.55682931 -0.321647533 -0.20120494
## 14 0.19710891 0.04272970 1.4447019 -1.57614215 -0.425153814 0.23471436
## 15 0.45166980 -0.08074649 0.9137719 1.65808320 0.892597258 -0.18957926
## 16 -0.09477052 -0.29372442 -0.9534533 0.30653759 -0.002903886 -0.61308771
## 17 0.73274686 0.48408794 1.2000204 0.37582601 0.295813723 0.03415113
## 18 -1.55514793 -0.26225599 -1.9407432 1.91002123 0.177054600 1.45651783
## 19 1.68812268 1.48083835 0.6737650 2.22027091 1.207690640 1.70581745
## 20 0.21718653 2.05342377 -0.8576597 0.88526001 -1.128567884 -1.01172878
## 21 0.09852432 1.38211494 0.9610816 0.94561280 -1.727120433 -0.29377075
## 22 3.29018415 2.20511611 0.1090580 -2.65417910 0.228780523 1.05911971
## 23 1.76352352 0.22697261 -0.5008575 -1.17412293 -0.543573523 0.26926494
## 24 1.09720645 -2.07177935 0.3026713 -0.32079048 0.179893343 0.90005392
## 25 1.20967806 -0.76672601 0.7379663 1.60116378 -0.025882362 0.37837293
## 26 2.12261877 0.87179686 0.6038219 0.61698512 -0.574392528 0.49510854
## 27 1.25996525 -0.72251736 0.7001585 -1.02389729 -1.643494495 0.23030656
## 28 1.97563355 -1.54719449 -0.1739730 -0.05992243 0.144607196 1.42570137
## 29 1.51688073 -4.30908967 -0.6415354 0.54999932 -0.678984984 -0.78917196
## 30 1.54956375 0.07560368 -1.8166468 0.61134085 -2.576277370 -0.98503009
## PC7 PC8 PC9 PC10 PC11
## 1 0.317942817 -0.006314034 0.22210644 0.02331291 0.0037710703
## 2 -0.125615582 0.142212594 -0.23968151 0.09193448 0.0125315307
## 3 -0.023658569 0.907390055 0.34176752 -0.83888840 0.0080870617
## 4 0.046643073 0.381040431 -0.06350358 0.78971692 -0.0147287675
## 5 0.530069736 0.155521649 -0.15190088 0.25732680 0.0015446228
## 6 -0.019896926 -0.887517524 -0.02787462 -0.85975141 0.0075756971
## 7 0.718655029 -0.236896830 -0.17013618 -0.29438019 0.0078716269
## 8 -0.263399020 -0.933902443 -0.04135255 0.37419491 -0.0101032434
## 9 0.089330324 -1.027046922 0.26446260 0.37317237 -0.0191405061
## 10 -0.007721053 0.106781817 0.48425539 0.14712607 0.0013738371
## 11 0.005430565 0.883926221 0.79133583 0.59401213 0.0013764401
## 12 -1.277488808 0.344752999 0.66458940 -0.02225418 0.0007750084
## 13 -0.445845558 -0.839300437 -0.21075099 -0.14375804 0.0011500550
## 14 -0.103185515 -0.102851617 -0.87147836 0.23538112 -0.0043804662
## 15 -1.379921807 0.712018440 0.30594061 -0.15569631 -0.0180557545
## 16 1.148446203 -0.151702097 -0.64902616 0.03858305 0.0033038023
## 17 1.888211492 0.526346303 -0.34829490 -0.23930014 -0.0263530159
## 18 -0.330084529 0.396140448 -0.34024113 0.47401019 0.0045067576
## 19 0.214101899 -0.734317529 0.81161157 0.28643531 0.0154369416
## 20 -1.396191203 0.128766662 -0.43561512 -0.24386990 -0.0090153168
## 21 -0.002524130 -0.604504095 0.33500906 -0.29998999 -0.0013776825
## 22 0.208465643 0.008833207 0.15124148 -0.59221748 0.0031297100
## 23 -0.634579536 -0.448495837 -0.23524701 0.38305438 0.0238269356
## 24 -1.403633024 0.336472538 -1.32636846 -0.07902744 0.0027204653
## 25 0.959809753 0.811830290 -0.71619970 -0.17556800 0.0127101497
## 26 0.210830667 0.106777477 -0.27879892 0.22557987 -0.0022390876
## 27 0.409438020 0.792303333 0.80756402 0.17722898 0.0133777349
## 28 -0.167583342 -0.102485014 0.47565321 -0.76146442 -0.0147198399
## 29 0.590401425 -0.504715760 0.04974661 0.19062664 0.0043104563
## 30 0.243551958 -0.161064323 0.40118632 0.04446976 -0.0092662233
## PC12
## 1 1.632454e-15
## 2 8.659919e-16
## 3 5.348558e-16
## 4 5.762514e-16
## 5 2.688263e-16
## 6 -3.846643e-16
## 7 3.652996e-16
## 8 2.206574e-16
## 9 1.796518e-16
## 10 4.444802e-17
## 11 4.515247e-17
## 12 2.855369e-16
## 13 -9.364074e-17
## 14 8.295629e-17
## 15 2.613267e-17
## 16 -1.592937e-16
## 17 -1.423879e-16
## 18 3.062346e-16
## 19 -8.655010e-17
## 20 -2.681813e-17
## 21 -6.655394e-17
## 22 -5.867043e-16
## 23 -1.850838e-16
## 24 -4.803115e-16
## 25 -2.902691e-16
## 26 -2.309991e-16
## 27 -1.121738e-16
## 28 -8.560654e-16
## 29 -8.543717e-16
## 30 -3.928391e-16
Note: The signs of the columns of the loadings and scores are arbitrary, and so may differ between different programs for PCA, and even between different builds of R. Look around at the output of your neighbors to see who has similar results to yours, and who has different outputs. Quickly inspect the objects created above: • How many eigenvalues are almost zero (or zero)? 2 • What about the loading associated to the 12th PC? each value in 12th PC is alomst zero. • What about the 12th PC score? 12th PC score all close to zero. • Can you guess what’s going on with the values of the 12th dimension? the values of the 12th dimension all small and close to zero.
Your turn • Compare the results of prcomp() against those of princomp() in terms of eigenvalues,loadings, and PCs
pca_princomp <- princomp(dat, cor = TRUE)
eigenvalues2<-pca_princomp$sdev^2
print(eigenvalues)
## [1] 4.164615e+00 2.061621e+00 1.377660e+00 1.339038e+00 9.445148e-01
## [6] 8.266370e-01 5.418803e-01 3.172210e-01 2.634804e-01 1.632044e-01
## [11] 1.273592e-04 6.167771e-33
V<-pca_princomp$loadings
print(V)
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## wins 0.398 0.156 0.107 -0.153 0.468 0.190
## losses -0.398 -0.156 -0.107 0.153 -0.468 -0.190
## points 0.420 -0.199 -0.309 -0.113 0.151
## field_goals 0.357 -0.238 0.293 -0.410 0.169 0.452
## points3 0.283 0.242 -0.439 -0.264 -0.163 0.370 -0.385
## free_throws 0.167 -0.345 -0.426 0.529 -0.487
## off_rebounds -0.442 0.456 0.443 -0.133 -0.590
## def_rebounds 0.213 0.200 -0.103 0.602 -0.266 0.276 0.130
## assists 0.369 0.121 -0.102 -0.330 -0.401 -0.296 -0.419
## steals 0.208 -0.368 0.392 -0.337 -0.150 -0.299 0.128
## blocks 0.196 0.559 0.136 -0.334 0.437 0.327 -0.248
## personal_fouls -0.547 -0.119 -0.277 0.113 0.550
## Comp.9 Comp.10 Comp.11 Comp.12
## wins 0.148 0.707
## losses -0.148 0.707
## points 0.172 -0.200 0.750
## field_goals 0.232 -0.518
## points3 -0.207 -0.388 -0.295
## free_throws 0.244 -0.286
## off_rebounds -0.139 -0.121
## def_rebounds -0.566 0.239
## assists 0.109 0.538
## steals -0.560 -0.332
## blocks 0.346 -0.191
## personal_fouls -0.165 0.503
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083
## Cumulative Var 0.083 0.167 0.250 0.333 0.417 0.500 0.583 0.667
## Comp.9 Comp.10 Comp.11 Comp.12
## SS loadings 1.000 1.000 1.000 1.000
## Proportion Var 0.083 0.083 0.083 0.083
## Cumulative Var 0.750 0.833 0.917 1.000
Z<-pca_princomp$scores
print(Z)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## 1 7.23644888 -0.33030394 2.32966665 -0.34721212 -1.464375751
## 2 2.18025415 0.98955622 1.85070553 0.08738417 0.876908911
## 3 3.93456619 -0.55444337 -2.34024534 -0.91911088 0.285272665
## 4 1.58429347 0.76186037 -1.36920017 -1.69441165 0.331351758
## 5 -0.92914094 2.22107971 0.21537162 0.05843287 1.186579503
## 6 0.29008641 -1.44543078 0.07768950 -0.60776889 1.507512339
## 7 1.72637760 2.27309493 -2.19819268 0.40351079 0.568427574
## 8 1.32348093 0.53124144 -1.24378781 -0.35610253 0.393777984
## 9 1.45543362 -1.31779875 0.22486284 -0.77077419 0.791992836
## 10 0.55582403 -1.55298432 0.10642207 1.27355454 1.080756306
## 11 -1.73118430 -0.88636943 -0.09996612 -1.77255576 1.108271726
## 12 0.11629556 0.54733831 0.78689628 0.25606677 -0.033426560
## 13 0.07748000 0.01555967 0.88460064 -0.56634845 -0.327146179
## 14 -0.20047854 0.04346017 1.46939945 -1.60308670 -0.432421927
## 15 -0.45939121 -0.08212687 0.92939308 1.68642856 0.907856437
## 16 0.09639065 -0.29874572 -0.96975280 0.31177793 -0.002953529
## 17 -0.74527335 0.49236355 1.22053509 0.38225086 0.300870734
## 18 1.58173358 -0.26673932 -1.97392066 1.94267353 0.180081394
## 19 -1.71698156 1.50615366 0.68528319 2.25822700 1.228336420
## 20 -0.22089939 2.08852757 -0.87232161 0.90039376 -1.147861041
## 21 -0.10020862 1.40574255 0.97751152 0.96177829 -1.756645998
## 22 -3.34643069 2.24281313 0.11092237 -2.69955297 0.232691584
## 23 -1.79367141 0.23085276 -0.50941975 -1.19419486 -0.552866051
## 24 -1.11596347 -2.10719695 0.30784553 -0.32627447 0.182968665
## 25 -1.23035781 -0.77983338 0.75058200 1.62853608 -0.026324827
## 26 -2.15890548 0.88670045 0.61414435 0.62753264 -0.584211915
## 27 -1.28150468 -0.73486898 0.71212786 -1.04140107 -1.671590453
## 28 -2.00940751 -1.57364418 -0.17694716 -0.06094682 0.147079293
## 29 -1.54281219 -4.38275466 -0.65250261 0.55940170 -0.690592406
## 30 -1.57605394 0.07689615 -1.84770288 0.62179188 -2.620319490
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## 1 -0.56923867 0.323378131 0.006421975 -0.22590340 0.02371145
## 2 1.08252405 -0.127763013 -0.144643755 0.24377893 0.09350612
## 3 -0.38171247 -0.024063018 -0.922902119 -0.34761012 -0.85322941
## 4 -0.14739609 0.047440448 -0.387554415 0.06458919 0.80321734
## 5 1.64459802 0.539131413 -0.158180331 0.15449766 0.26172587
## 6 1.59138757 -0.020237069 0.902689863 0.02835115 -0.87444908
## 7 -0.49502550 0.730940620 0.240946641 0.17304470 -0.29941269
## 8 0.57031340 -0.267901893 0.949867744 0.04205948 0.38059187
## 9 -1.44884041 0.090857449 1.044604551 -0.26898366 0.37955184
## 10 1.11649515 -0.007853046 -0.108607279 -0.49253386 0.14964123
## 11 0.81557553 0.005523402 -0.899037165 -0.80486392 0.60416692
## 12 0.39706017 -1.299327806 -0.350646640 -0.67595073 -0.02263462
## 13 0.20464459 -0.453467402 0.853648493 0.21435383 -0.14621562
## 14 -0.23872686 -0.104949498 0.104609892 0.88637651 0.23940502
## 15 0.19282017 -1.403511923 -0.724190577 -0.31117074 -0.15835797
## 16 0.62356860 1.168079184 0.154295484 0.66012143 0.03924263
## 17 -0.03473496 1.920490950 -0.535344327 0.35424909 -0.24339104
## 18 -1.48141737 -0.335727408 -0.402912570 0.34605763 0.48211352
## 19 -1.73497883 0.217762026 0.746870875 -0.82548628 0.29133199
## 20 1.02902454 -1.420059448 -0.130967961 0.44306207 -0.24803892
## 21 0.29879284 -0.002567281 0.614838246 -0.34073613 -0.30511839
## 22 -1.07722563 0.212029416 -0.008984213 -0.15382699 -0.60234159
## 23 -0.27386809 -0.645427836 0.456162988 0.23926861 0.38960280
## 24 -0.91544057 -1.427628489 -0.342224622 1.34904307 -0.08037843
## 25 -0.38484132 0.976217946 -0.825708736 0.72844331 -0.17856938
## 26 -0.50357255 0.214434871 -0.108602865 0.28356506 0.22943621
## 27 -0.23424371 0.416437467 -0.805847960 -0.82136953 0.18025876
## 28 -1.45007410 -0.170448222 0.104237022 -0.48378462 -0.77448185
## 29 0.80266306 0.600494488 0.513344005 -0.05059704 0.19388545
## 30 1.00186943 0.247715541 0.163817759 -0.40804471 0.04522998
## Comp.11 Comp.12
## 1 -0.0038355378 -5.942721e-15
## 2 -0.0127457604 -2.669569e-14
## 3 -0.0082253121 -1.615294e-14
## 4 0.0149805596 2.913457e-14
## 5 -0.0015710286 -5.920484e-15
## 6 -0.0077052056 -1.741749e-14
## 7 -0.0080061944 -1.635149e-14
## 8 0.0102759609 1.989617e-14
## 9 0.0194677179 3.843099e-14
## 10 -0.0013973232 -2.953595e-15
## 11 -0.0013999707 -3.594482e-15
## 12 -0.0007882574 -1.950876e-15
## 13 -0.0011697155 -2.491608e-15
## 14 0.0044553514 8.375594e-15
## 15 0.0183644222 3.446904e-14
## 16 -0.0033602816 -6.367749e-15
## 17 0.0268035274 5.008996e-14
## 18 -0.0045838018 -6.394542e-15
## 19 -0.0157008400 -3.050795e-14
## 20 0.0091694360 1.757720e-14
## 21 0.0014012343 3.347594e-15
## 22 -0.0031832132 -8.635171e-15
## 23 -0.0242342632 -4.577528e-14
## 24 -0.0027669724 -4.453825e-15
## 25 -0.0129274330 -2.473427e-14
## 26 0.0022773653 4.842052e-15
## 27 -0.0136064307 -2.451377e-14
## 28 0.0149714793 2.939253e-14
## 29 -0.0043841447 -5.344336e-15
## 30 0.0094246317 2.100339e-14
• If you carefully look at the princomp() loadings, you should notice that some values are left in blank. Why is this? Check the documentation ?princomp
A : Small loadings are conventionally not printed (replaced by spaces), to draw the eye to the pattern of the larger loadings.
Your turn What are the differences between prcomp() and princomp()? Spend some time reading the help documentation of both functions to find out the main differences between them. Are there any cases when it would be better to use one function or the other?
A : princomp() use evd on x to calculate the PCs. But a preferred method of calculation is to use svd on x, as is done in prcomp(). Princomp() only handles so-called R-mode PCA, that is feature extraction of variables. If a data matrix is supplied (possibly via a formula) it is required that there are at least as many units as variables. For Q-mode PCA use prcomp() function.
Your turn Compute a table containing the eigenvalues, the variance in terms of percentages, and the cumulative percentages, like the table below. Analysts typically look at a bar-chart of the eigenvalues (see figure below). Plot your own bar-chart.
per=vector(,length(eigenvalues))
cum_per=vector(,length(eigenvalues))
per[1]=eigenvalues[1]/sum(eigenvalues)*100
cum_per[1]=per[1]
for (i in 2:length(eigenvalues)) {
per[i]=eigenvalues[i]/sum(eigenvalues)*100
cum_per[i]=cum_per[i-1]+per[i]
}
data.frame(eigenvalue=eigenvalues,percentage=per,cumulative.percentage=cum_per)
barplot(eigenvalues)
• How much of the variation in the data is captured by the first PC? 4.164615 • How much of the variation in the data is captured by the second PC? 2.061621 • How much of the variation in the data is captured by the first two PCs? 6.21
Your turn • Calculate a matrix (ot table) of correlations between the variables and the PCs. In other words, what are the correlations of the variables with the 1st PC, with the 2nd PC, and so on.
cc<-cor(dat,pca_prcomp$x)
library(factoextra)
## Loading required package: ggplot2
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
fviz_pca_var(pca_prcomp, col.var = "black")
• What variables seem to be more correlated with PC1?
points
• What variables seem to be more correlated with PC2?
personal_fouls
Your turn Begin with a scatterplot of the first two PCs (see figure below).
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
# data frame for plot_ly()
scores_df <- cbind.data.frame(
pca_prcomp$x,
team = dataset$team, stringsAsFactors = FALSE
)
# scatter plot
plot_ly(data = scores_df, x = ~PC1, y = ~PC2, type = 'scatter',
mode = 'markers',
text = ~team,
marker = list(size = 10))
• Also plot PC1 - PC3, and then plot PC2 - PC3. If you want, continue visualizing other scatterplots.
plot_ly(data = scores_df, x = ~PC1, y = ~PC3, type = 'scatter',
mode = 'markers',
text = ~team,
marker = list(size = 10))
plot_ly(data = scores_df, x = ~PC2, y = ~PC3, type = 'scatter',
mode = 'markers',
text = ~team,
marker = list(size = 10))
• What patterns do you see?
• Try adding numeric labels to the points to see which observations seem to be potential outliers.
# 3d scatter plot
plot_ly(data = scores_df, x = ~PC1, y = ~PC2, z = ~PC3, type = 'scatter3d',
mode = 'markers', text = ~team)
Your turn: Graph various biplot()’s with different values of scale (e.g. 0, 0.3, 0.5, 1). How do the relative positions of the arrows change with respect to the points? Under which scale value you find it easier to read the biplot?
biplot(pca_prcomp, scale = 0)
biplot(pca_prcomp, scale = 0.3)
biplot(pca_prcomp, scale = 0.5)
biplot(pca_prcomp, scale = 1)